Supervised detection of anomalous light-curves in massive astronomical catalogs
نویسندگان
چکیده
The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set we perform a validity test and show that when the random forest classifier attempts to classify unknown light-curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive datasets given that the training process is performed offline. We tested our algorithm on 20 million light-curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post analysis stage by perfoming a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables and X-ray sources. For some outliers there were no additional information. Among them we identified three unknown variability types and few individual outliers that will be followed up in order to do a deeper analysis. Subject headings: methods: data analysis – methods: statistical – stars: statistics – stars: variables: general – catalogs
منابع مشابه
Anomaly Detection for Astronomical Data
Modern astronomical observatories can produce massive amount of data that are beyond the capability of the researchers to even take a glance. These scientific observations present both great opportunities and challenges for astronomers and machine learning researchers. In this project we address the problem of detecting anomalies/novelties in these large-scale astronomical data sets. Two types ...
متن کاملGravitational and Cosmological Spectral Shift with Remote Quantum States
A class of coordinate systems is found for Friedmann Cosmologies with local gravity such that it is possible to formulate quantum theory over astronomical and cosmological distances. When light from distance objects is treated as a quantum motion, new predictions are found for cosmological redshift and lensing. Good agreement is found between predictions and supernova redshifts for a closed Fri...
متن کاملFinding outlier light-curves in catalogs of periodic variable stars
We present a methodology to discover outliers in catalogs of periodic light-curves. We use cross-correlation as measure of “similarity” between two individual light-curves and then classify light-curves with lowest average “similarity” as outliers. We performed the analysis on catalogs of variable stars of known type from the MACHO and OGLE projects and established that our method correctly ide...
متن کاملOn GPU-Based Nearest Neighbor Queries for Large-Scale Photometric Catalogs in Astronomy
Nowadays astronomical catalogs contain patterns of hundreds of millions of objects with data volumes in the terabyte range. Upcoming projects will gather such patterns for several billions of objects with petaand exabytes of data. From a machine learning point of view, these settings often yield unsupervised, semi-supervised, or fully supervised tasks, with large training and huge test sets. Re...
متن کاملSemi-supervised Learning for Anomalous Trajectory Detection
A novel learning framework is proposed for anomalous behaviour detection in a video surveillance scenario, so that a classifier which distinguishes between normal and anomalous behaviour patterns can be incrementally trained with the assistance of a human operator. We consider the behaviour of pedestrians in terms of motion trajectories, and parametrise these trajectories using the control poin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1404.4888 شماره
صفحات -
تاریخ انتشار 2014